Krasnoyarsk
- Asia > Bangladesh (0.04)
- Asia > Russia > Siberian Federal District > Krasnoyarsk Krai > Krasnoyarsk (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- (3 more...)
- Asia > Bangladesh (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
OptimalThinkingBench: Evaluating Over and Underthinking in LLMs
Aggarwal, Pranjal, Kim, Seungone, Lanchantin, Jack, Welleck, Sean, Weston, Jason, Kulikov, Ilia, Saha, Swarnadeep
Thinking LLMs solve complex tasks at the expense of increased compute and overthinking on simpler problems, while non-thinking LLMs are faster and cheaper but underthink on harder reasoning problems. This has led to the development of separate thinking and non-thinking LLM variants, leaving the onus of selecting the optimal model for each query on the end user. We introduce OptimalThinkingBench, a unified benchmark that jointly evaluates overthinking and underthinking in LLMs and also encourages the development of optimally-thinking models that balance performance and efficiency. Our benchmark comprises two sub-benchmarks: OverthinkingBench, featuring simple math and general queries in 72 domains, and UnderthinkingBench, containing 11 challenging reasoning tasks along with harder math problems. Using novel thinking-adjusted accuracy metrics, we extensively evaluate 33 different thinking and non-thinking models and show that no model is able to optimally think on our benchmark. Thinking models often overthink for hundreds of tokens on the simplest user queries without improving performance. In contrast, large non-thinking models underthink, often falling short of much smaller thinking models. We further explore several methods to encourage optimal thinking, but find that these approaches often improve on one sub-benchmark at the expense of the other, highlighting the need for better unified and optimal models in the future.
- Europe > Russia > Northwestern Federal District > Kaliningrad Oblast > Kaliningrad (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States (0.04)
- (9 more...)
- Leisure & Entertainment (1.00)
- Health & Medicine (1.00)
- Media > Music (0.94)
- Education (0.68)
- Asia > Bangladesh (0.04)
- Asia > Russia > Siberian Federal District > Krasnoyarsk Krai > Krasnoyarsk (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- (3 more...)
- Asia > Bangladesh (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
AI Agents in Cryptoland: Practical Attacks and No Silver Bullet
Patlan, Atharv Singh, Sheng, Peiyao, Hebbar, S. Ashwin, Mittal, Prateek, Viswanath, Pramod
The integration of AI agents with Web3 ecosystems harnesses their complementary potential for autonomy and openness, yet also introduces underexplored security risks, as these agents dynamically interact with financial protocols and immutable smart contracts. This paper investigates the vulnerabilities of AI agents within blockchain-based financial ecosystems when exposed to adversarial threats in real-world scenarios. We introduce the concept of context manipulation -- a comprehensive attack vector that exploits unprotected context surfaces, including input channels, memory modules, and external data feeds. Through empirical analysis of ElizaOS, a decentralized AI agent framework for automated Web3 operations, we demonstrate how adversaries can manipulate context by injecting malicious instructions into prompts or historical interaction records, leading to unintended asset transfers and protocol violations which could be financially devastating. Our findings indicate that prompt-based defenses are insufficient, as malicious inputs can corrupt an agent's stored context, creating cascading vulnerabilities across interactions and platforms. This research highlights the urgent need to develop AI agents that are both secure and fiduciarily responsible.
- Europe (0.14)
- Asia > Russia > Siberian Federal District > Krasnoyarsk Krai > Krasnoyarsk (0.04)
- Information Technology > Security & Privacy (1.00)
- Banking & Finance > Trading (1.00)
VLMs as GeoGuessr Masters: Exceptional Performance, Hidden Biases, and Privacy Risks
Huang, Jingyuan, Huang, Jen-tse, Liu, Ziyi, Liu, Xiaoyuan, Wang, Wenxuan, Zhao, Jieyu
Visual-Language Models (VLMs) have shown remarkable performance across various tasks, particularly in recognizing geographic information from images. However, significant challenges remain, including biases and privacy concerns. To systematically address these issues in the context of geographic information recognition, we introduce a benchmark dataset consisting of 1,200 images paired with detailed geographic metadata. Evaluating four VLMs, we find that while these models demonstrate the ability to recognize geographic information from images, achieving up to $53.8\%$ accuracy in city prediction, they exhibit significant regional biases. Specifically, performance is substantially higher for economically developed and densely populated regions compared to less developed ($-12.5\%$) and sparsely populated ($-17.0\%$) areas. Moreover, the models exhibit regional biases, frequently overpredicting certain locations; for instance, they consistently predict Sydney for images taken in Australia. The strong performance of VLMs also raises privacy concerns, particularly for users who share images online without the intent of being identified. Our code and dataset are publicly available at https://github.com/uscnlp-lime/FairLocator.
- North America > United States > California > Los Angeles County > Los Angeles (0.15)
- Asia > India > Karnataka > Bengaluru (0.14)
- North America > United States > New York (0.06)
- (74 more...)
Cellpose+, a morphological analysis tool for feature extraction of stained cell images
Huaman, Israel A., Ghorabe, Fares D. E., Chumakova, Sofya S., Pisarenko, Alexandra A., Dudaev, Alexey E., Volova, Tatiana G., Ryltseva, Galina A., Ulasevich, Sviatlana A., Shishatskaya, Ekaterina I., Skorb, Ekaterina V., Zun, Pavel S.
Advanced image segmentation and processing tools present an opportunity to study cell processes and their dynamics. However, image analysis is often routine and time-consuming. Nowadays, alternative data-driven approaches using deep learning are potentially offering automatized, accurate, and fast image analysis. In this paper, we extend the applications of Cellpose, a state-of-the-art cell segmentation framework, with feature extraction capabilities to assess morphological characteristics. We also introduce a dataset of DAPI and FITC stained cells to which our new method is applied.
- North America > United States (0.15)
- Europe > Germany (0.05)
- Europe > Switzerland (0.04)
- (7 more...)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Materials > Chemicals > Commodity Chemicals (0.69)
- (3 more...)
Context is Key(NMF): Modelling Topical Information Dynamics in Chinese Diaspora Media
Kristensen-McLachlan, Ross Deans, Hicke, Rebecca M. M., Kardos, Márton, Thunø, Mette
Does the People's Republic of China (PRC) interfere with European elections through ethnic Chinese diaspora media? This question forms the basis of an ongoing research project exploring how PRC narratives about European elections are represented in Chinese diaspora media, and thus the objectives of PRC news media manipulation. In order to study diaspora media efficiently and at scale, it is necessary to use techniques derived from quantitative text analysis, such as topic modelling. In this paper, we present a pipeline for studying information dynamics in Chinese media. Firstly, we present KeyNMF, a new approach to static and dynamic topic modelling using transformer-based contextual embedding models. We provide benchmark evaluations to demonstrate that our approach is competitive on a number of Chinese datasets and metrics. Secondly, we integrate KeyNMF with existing methods for describing information dynamics in complex systems. We apply this pipeline to data from five news sites, focusing on the period of time leading up to the 2024 European parliamentary elections. Our methods and results demonstrate the effectiveness of KeyNMF for studying information dynamics in Chinese media and lay groundwork for further work addressing the broader research questions.
- Media > News (1.00)
- Government > Regional Government > Europe Government (0.46)
Fair Railway Network Design
He, Zixu, Botan, Sirin, Lang, Jérôme, Saffidine, Abdallah, Sikora, Florian, Workman, Silas
When designing a public transportation network in a country, one may want to minimise the sum of travel duration of all inhabitants. This corresponds to a purely utilitarian view and does not involve any fairness consideration, as the resulting network will typically benefit the capital city and/or large central cities while leaving some peripheral cities behind. On the other hand, a more egalitarian view will allow some people to travel between peripheral cities without having to go through a central city. We define a model, propose algorithms for computing solution networks, and report on experiments based on real data.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > France > Pays de la Loire > Loire-Atlantique > Nantes (0.04)
- (64 more...)